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OUTLIER ROBUST CORNER-PRESERVING METHODS FOR 
RECONSTRUCTING NOISY IMAGES^ 

By Martin Hillebrand and Christine H. Muller 

Munich University of Technology and University of Kassel 

The ability to remove a large amount of noise and the abil- 
ity to preserve most structure are desirable properties of an image 
smoother. Unfortunately, they usually seem to be at odds with each 
other; one can only improve one property at the cost of the other. 
By combining M-smoothing and least-squares-trimming, the TM- 
smoother is introduced as a means to unify corner-preserving prop- 
erties and outlier robustness. To identify edge- and corner-preserving 
properties, a new theory based on differential geometry is developed. 
Further, robustness concepts are transferred to image processing. In 
two examples, the TM-smoother outperforms other corner-preserving 
smoothers. A software package containing both the TM- and the M- 
smoother can be downloaded from the Internet. 

1. Introduction. In recent years, image processing has become an impor- 
tant issue due to tlie rapid development of digitization and its applications 
in both industry and science. A fundamental operation in image process- 
ing is the reconstruction of a noisy digital image. A procedure denoising an 
image aims to achieve two objectives: 

• removing as much noise as possible; 

• preserving as much of the true signal as possible. 

These goals are difficult to achieve at the same time and, in particular, it is 
difficult to remove outliers and to preserve discontinuities simultaneously. 

Recently, several edge- and corner-preserving smoothing methods have 
been proposed. Some are methods based on wavelets and related methods 
(see, e.g., [2, 4, 5] and the references therein). Other methods are based on 
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special local estimators where the reconstructed pixel value is calculated by 
pixel values in a neighborhood (window). Such a neighborhood is usually 
provided by a kernel function, and so these estimators are called kernel es- 
timators. Chu et al. [3] proposed the use of an M-kernel estimator based on 
a redescending objective function, while Polzehl and Spokoiny [18, 19] pro- 
posed methods based on an adaptive choice of the kernel function. However, 
none of these methods can eliminate isolated outliers, that is, none of them 
is outlier robust. The methods based on wavelets and other regularization 
methods are even known to be peak-preserving which, in particular, means 
that outliers are preserved. 

On the other hand, there are many reconstruction methods which are able 
to remove outliers. The most prominent ones in image analysis are kernel 
estimators based on outlier robust location estimators, such as the median 
smoother studied in [10] and the estimators based on least trimmed squares 
estimators studied by Meer et al. [11, 12], Rousseeuw and Van Aelst [23] 
and Miiller [15, 16, 17]. But these estimators are not corner-preserving. 

The ability to preserve corners and the ability to remove outliers seem to 
be contradictory properties. Some methods can have both properties, but 
not simultaneously. In these cases, one can switch from one property to the 
other by changing a few parameters. For regularization methods, this can be 
done, for example, by a high or low penalty function. For the M-estimator 
of Chu et al. [3], it depends on the scale parameter (see Figures 11 and 12). 

Although a lot of literature has been published on edge- and corner- 
preserving smoothing, there is, surprisingly, no theoretical concept of two- 
dimensional discontinuities with nondifferentiable edge curves which could 
characterize, for example, a corner as we would identify it on the basis of a 
visual impression. Instead, theory has only been developed for dimension 1, 
where discontinuities can only be jumps, or for differentiable edge curves, as 
Polzehl and Spokoiny [19] have done. 

This paper fills the gap — an intuitive differential geometric framework is 
set up in which edges and corners are properly defined. This allows for a 
definition of asymptotic corner- (resp. edge-) preserving as consistency at a 
corner (resp. edge) point. 

On the other hand, there is also an absence of any formalization of the 
quality of removing noise in terms of robustness against irregular distributed 
noise such as outliers. For this purpose, we transfer robustness concepts to 
the image analysis context. 

Having constructed such a formal framework for image smoothing, we can 
show that the M-smoother introduced by Chu et al. [3] has the remarkable 
property of being both asymptotically corner-preserving and robust for large 
samples. However, in the finite case it turns out that it is not robust against 
outliers. 
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Fig. 1. Original image. FiG. 2. Noisy image. 



Combining the Chu smoother with a trimming procedure, we introduce 
the trimmed M-smoother ( TM-smoother) which unifies the corner-preserving 
properties of the Chu smoother and an excellent outlier robustness. This is 
a remarkable combination of properties; we have not found such properties 
unified in any of the other existing methods. The TM-smoother is reason- 
ably successful in distinguishing between corners and outliers. Note that the 
TM-smoother is not a two-step estimator. The M-smoothing part uses both 
the trimmed and the untrimmed data set. It is the sophisticated interplay 
of trimming and M-smoothing which gives the attractive combination of 
properties; the M-smoother alone does not remove the outliers, while the 
trimming procedure not only eliminates outliers but also "regular" values 
from corners. 

The following example illustrates the outlier robustness property com- 
bined with the good smoothing property of the TM-smoother. To the origi- 
nal image (Figure 1), noise is added (Figure 2). The TM-smoother (Figure 3) 
performs better in outlier removal than the AWS-estimator of Polzehl and 
Spokoiny [18] (Figure 4). More details on this example are given in Section 2. 

Section 2 provides more details about the M- and TM-smoothers, illus- 
trated with further examples. Hereafter, by "M-smoother" we always mean 
the redescending M-smoother introduced by Chu et al. In Section 3, edges 
and corners are defined based on a differential geometric approach. Consis- 
tency and corner preservation are treated in Section 4. Also, model assump- 
tions are discussed. In Section 5, asymptotic and nonasymptotic robustness 
concepts are transferred from location estimation to nonparametric (two- 
dimensional) regression. It is shown under which conditions both the M- 
and the TM-smoother are asymptotically robust and that the TM-smoother 
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Fig. 3. TM-smoother. 



Fig. 4. AWS. 



is even outlier robust in the finite case. Section 6 outlines practical aspects 
of the estimators and Section 7 summarizes the results. 

Since the whole theoretical framework is new, there is no standard way of 
showing the different properties. The proofs in the Appendix provide insights 
about how to work with this new theory and how it is related to standard 
nonparametric regression and estimation of location. 

2. M- and TM-estimators. The reason why local estimators, especially 
robust ones, can lose their discontinuity-preserving property when trans- 
ferred from one-dimensional to two-dimensional regression can be explained 
as follows; see also Figures 5 and 6. 

In the one-dimensional case, usually (if the jumps are not too close, which 
can, at least asymptotically, be assumed) the majority of the data in the 
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Fig. 5. One-dimensional discontinuity. 

Fig. 6. Two-dimensional discontinuity. 
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neighborhood ("window") are on the "right side" of the jump. Hence, an 
estimator need only fohow the majority of the data (as most robust estima- 
tors do) to preserve the jump. However, in the two-dimensional case, this is 
no longer a successful strategy — around a corner point, the majority of the 
data is usually on the "wrong" side of the discontinuity; see Figure 6. 

The M-smoother of Chu et al. [3] looks for a local mode of a density 
estimate -ffn.x and hence also allows the estimator to be a "minority point" . 
Therefore, it is able to preserve corners. In particular, it chooses the mode 
y from the set of modes Mn of Hn^x which is closest to the observation in 
the center of the window. 

More formally, we consider images given by pixel values m{xij) (typically 
in a bounded interval R of nonnegative numbers) at pixel positions Xjj, 
i,j = l,...,n, where we can assume without loss of generality that xij G 
[0, 1]^. To estimate the original image m{x) on the basis of the observations 
Y = (5^ij)ij=i,...,rn where Yij = m{xij) +£ij and Sij is random noise, we define 
the M-estimator of Chu et al. [3] by 

mn{x) := rhn,xiy) '■= argmin{|y - ligj-J : y is element of 7Vn(x)}, 

j/GK 

where 

Mn{x) := {y E M : y is a local maximum of Hn^xiy) 
with y < Yi^j^^ if H^^^{Yi^j^) < 
and y > Yi^j,, if H'^^^{Yi,,j,,) > 0} 

and 

1 " 

Hn,xiy) •= 72 ^hAx - Xij)Lg^{y - Yij). 

Here, (io, jo) := argmin(jj)g{^...^„}2 \\x - Xij\\2 [if x^ = (xf^- + x^._^^)^^.)/2 for 
k = l or 2, then define := i and analogously for jo] and K^^^x) := 
l/h^K{x/hn), Lg^{y) := l/gnL{y/gn) with kernel functions : ^ M and 
L : M — > M and bandwidths hn, Qn ^ (0, oo), respectively. Since it is easier to 
handle zeros of a function than minima, we note that mn{x) is an element 
of {y : H'^ ,j.{y) = 0}. The estimator mn{x) can be calculated by means of 
the Newton-Rap hson method starting at the center of the window {io,jo) 
and searching for the next maximum of Hn^xiv) in the ascending direction. 
Existence and uniqueness of this estimator follow as in the one-dimensional 
case (see [8]). 

One can imagine that the estimator will reach the wrong mode if the 
starting point yi^j^ is an outlier (see Figure 7). The basic idea of the TM- 
smoother is to trim the data set from which the density estimator Hn,x is 
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computed so that outliers cannot generate additional modes. Then the start- 
ing point remains the same, even if it is not used for the density estimation. 
If the starting point is an outlier, it needs to go a long way to the next mode 
of Hn^x, which is then hopefully the correct one (see Figure 8). We will later 
see that this strategy is successful. 

Reducing the data set so that possible outliers are eliminated is achieved 
by the trimming procedure of the least-trimmed squares (LTS) estimator 
introduced by Rousseeuw [21] (see also [22]). Define the set of indices of 
observations in the window which contains all positive kernel weights by 



where (s(fc)(y))fcg{i,...,#j„_4 is the order statistic of {sij{y) = (y - Yijf : 
ihj) G Jn,x}, /G (0,0.5) and r := [# J„,x • /J . 



Rousseeuw and Van Aelst [23] applied the LTS-estimator to image anal- 
ysis, but without formalizing the two-dimensional regression model. A de- 
tailed model and a qualitative robustness analysis are provided by Miiller 
[15, 16, 17]. However, the LTS estimator is not corner-preserving. To ob- 
tain a corner-preserving and outlier robust estimator, we do not need the 
LTS-estimate itself, only the trimmed set of observations 

Rn,l{x) ■■= G Jn,x : Sjj(?7lLTS,«(a^)) < Sl^\(l-l).#J^^^-\){muYS,l{x))]. 

Then the trimmed M- estimator or TM- smoother is basically the M-estimator 
where the density estimate is based on the trimmed data set. 



Jn,x ■ — {(^)j) G {l,...,Tl} . ||x Xij II oo 



<hn]. 



Then the l-trimmed LTS-estimator is defined as 

(#Jn,x-r 






Fig. 7. Hn,4y)- 




Fig. 8. Hn,x{y) based on a trimmed data set. 
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Fig. 9. Original image. FiG. 10. Noisy image. 



Definition 1. The TM-smoother mn,r{x) is defined as follows: 
mn,i{x) := mn,x,l(y) 

:= argmin{|y — Yi^j^l : y is an element of the closure of Mn,i{x)}, 

where 

^n,i{x) := {y G M : y is a local maximum of Hn,x{y) such that Hn,x{y) > 0, 

with y < if H'n^xiYiojo) < 
and y > Y^,,j^ if Hn^xiYiojo) > 0} 

and 

Hn,x{y) ^h,, {x - Xij)Lg^ {y - Yij). 

^hni ^gn ^'^d (^O)io) are defined as for the M-smoother. 

How this estimator performs in practice can be seen in the following exam- 
ple, the "SUSAN" image given by Smith and Brady [25], downloaded from 
www . springerlink . com/ content/?k=international+of +computer+vision. 
It is a 100 X 100 pixel image containing geometric figures with different kinds 
of edges and corners (see Figure 9). 

To each pixel, normally distributed random background noise with a stan- 
dard deviation of 26 [which is about 10% of the range of values since the 
brightness is linearly scaled from (black) to 255 (white)] is added. In 
addition to the background noise (residuals) which has expectation and 
bounded support, white colored outliers are added so that the model looks 
like 



Yij = (1 - 6ij){m{xij) + Sij) + 5ij ■ 255, 
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Fig. 11. Redescending M-smoother, FiG. 12. Redescendmg M-smoother, 
gn — 54.5. gn =85. 




Fig. 13. TM-smoother. FiG. 14. Adaptive weights smoother. 

where 5ij are i.i.d. Bernoulli distributed random variables with p = 0.01, in 
other words 5ij ~ 5(0.01); see Figure 10. 

The noisy image is then smoothed by the M- or TM-smoother. In Figure 
11, the M-smoother is used with parameter gn = 54.5, automatically cal- 
culated as the median of the interquartile ranges. We see that corners are 
preserved but that outliers also are. If one increases the scale (and smooth- 
ing) parameter to = 85, then outliers are removed, but corners are too 
(see Figure 12). 

Applying the TM-smoother with / = 0.15 and automatically chosen scale 
parameter gn = 54.5 to the test image in Figure 10 leads to the result in 
Figure 13. Now, the corners are preserved and the outliers are deleted. 
The software package epsi contains the M-kernel smoother and the TM- 
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Table 1 

Mean absolute error (MAE) and mean squared error (MSE) of the reconstructed SUSAN 

image 



Method 


MAE 


MSE 


Noisy image 


27.8 




2025.0 




Redescending M-smoother, gi„ — 54.5 


16.1 (- 


-42%) 


609.0 (- 


-70%) 


Redescending M-smoother, gn = 85 


19.2 (- 


-31%) 


662.8 (- 


-67%) 


TM-smoother, g„ = 54.5, I = 0.15 


13.9 (- 


-50%) 


350.9 (- 


-83%) 


Adaptive weights smoother 


21.1 (- 


-24%) 


795.4 (- 


-61%) 


kernel smoother implemented in 


the R-library 


and is 


downloadable 


from 



cran.r-project.org. 

The comparison with other corner-preserving methods shows that they 
are not able to delete outliers. For example, Figure 14 provides the result 
for the adaptive weights smoother (AWS) of Polzehl and Spokoiny [18] which 
appeared in their study as one of the best corner-preserving methods. 

The existence of the original image gives us — in addition to the visual 
impression — a second criterion for the performance of an estimator: 
it enables us to compute the absolute and quadratic "distances" of the 
smoothed noisy picture from the original, the mean absolute error (MAE) 
n~'^J27=i,j=i\''^i^ij) ~ ''T^n{xij)\ and the mean squared error (MSE) 
"■"^ Sr=i,j=i("^(^ii) ~ f^n{xij)Y-, respectively. In Table 1, the results for 
the different redescending M-kernel smoothers are given. This table also 
contains the corner-preserving adaptive weights smoother (AWS) of Polzehl 
and Spokoiny [18]. 

In the example given in the Introduction, we see that the TM-smoother 
also performs well on real images where the structure of the image is more 
complex. It is the "Lena image" which is famous in the image processing 
community and which can be downloaded from sipi.usc.edu/database/. 
To the true 512 x 512 pixel image, normally distributed background noise 
with standard deviation 17 and 1.6% outliers was added: 0.8% "salt" (white 
outliers) and 0.8% "pepper" (black outliers). The trimming parameter I = 
0.15 was chosen and hn was set at 0.004, resulting in a 5 x 5 pixel window 
where three data points are trimmed. Qn = 25.5 was again automatically 
calculated. The MSE and MAE corresponding to Figures 3 and 4 are given 
in Table 2. 

3. Edges and corners. While the set of discontinuities of a one-dimensional 
almost everywhere continuous regression function is usually the union of 
"jumps," the two-dimensional case is much more complicated. Here, the set 
of discontinuities of an a.e. continuous regression function is — apart from 
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i 4 

Fig. 15. Two-dimensional discontinuities. 



functions without a visual structure — a one-dimensional subset of the im- 
age that can have different shapes like the borderlines in the examples in 
Figure 15. 

To obtain a formal characterization of the discontinuities, we turn briefly 
to differential geometry; see, for example, [24]. 

Let / := [a,b] C M be a compact interval and let x = (j^j) : / — > be 
continuous. Then the set 

j:={x{t):tel} 

is called a parametrically- defined (parametrized) plane curve. The curve 7 
is called regular if the derivatives of x^{t) and exist. If the derivatives 





Table 2 




Mean absolute error (MAE) and 


mean squared error (MSE) 


of the reconstructed Lena 




image 




Method 


MAE 


MSE 


Noisy image 


16.0 


598.8 


TM-smoother, g„ = 25.5, / = 0.15 


6.07 (-62%) 


77.0 (-87%) 


Adaptive weights smoother 


6.51 (-59%) 


255.9 (-57%) 
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Fig. 16. Directional tangents. 



satisfy = 1 for all t G /, then the curve has a natural parametrization. 

A curve is called a simple or Jordan curve with respect to given parametriza- 
tion X = x{t), t S /, if x{t) is injective on [a, b] or, if the curve is closed [i.e., 
x{a) = x{b)] on (a, 6). 

Heretofore, we could use standard definitions. But for our highly spe- 
cialized topic of interest, we have to create some special structures. For 
geometric singularities (points where the natural parametrization is not dif- 
ferentiable) we introduce the following definition. 

Definition 2. If 7 is a simple curve with a natural parametrization on 
/\ {to} for some to £ I and the limits limtytgx'{t) and limt\to exist, 
then the pair of directional tangents of j in xq = x(to) is defined as 



Note that if x'{t) is Lipschitz continuous on /\ {to}; then the directional 
tangents exist, by the Cauchy criterion. In Figure 16, the directional tan- 
gents, which intersect at angle a, are represented by dotted lines. 

If Xq is a regular point, then the angle between the two directional tangents 
is a = vr and the union of the directional tangents is equal to the tangent 
at that point. But if we have a cuspidal point (see the fourth image of 
Figure 15), then the directional tangents are equal and the angle between the 
two directional tangents is a = 0. Hence, "real" corners in a visual sense, such 
as those in Images 2 and 3 of Figure 15, are characterized by the fact that 
the angle between the two directional tangents satisfies a S (0, vr) U (tt,2tt). 

Definition 3. Let 7 be a simple curve having parametrization 




and 




x = x{t), 



tel, 
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which is natural and has a bounded second derivative x"{t) in some open 
interval /' C /, except at a point xq = x{tQ), to £ I' ■ Then xq is called a 
corner point with angle a if the two directional tangents of xq intersect with 
angle a G (0, vr) U (vr, 27r) . 

It is apparent that the corner point is well defined, that is, that the pair 
of directional tangents exists. 

Definition 4. An edge curve is a simple closed curve with a natural 
parametrization and a bounded second derivative, except at a finite number 
of corner points. 

In the following, we will consider images for which the discontinuities can 
be described by edge curves; see Assumption (i32) below. This assumption 
allows for the consideration of a broad variety of images and every other 
image can be arbitrarily well approximated by such feasible images. For the 
corner-preserving property, we need the rather strong condition (^2) which 
requires that the discontinuities are always larger than the background noise. 
However, other estimates are not able to preserve corners, even if there is 
no noise at all. Also, we will find versions of the M-smoother (resp. TM- 
smoother) which are robust against a violation of the distribution assump- 
tion (^2) [resp. (^2')]. 

The bandwidth gn is a crucial smoothing parameter — the larger g^, the 
smoother the reconstructed image is; the smaller gn, the more discontinuities 
are preserved. Asymptotically, Chu et al. [3] suggest 5„ — > 0. Then Hn,x 
converges to the density of the distribution of the residuals Ei , SiS IS shown 
in detail for the one-dimensional case in [8]. But since, in this case, even 
a small contamination of the residual distribution may cause a large bias 
in the estimator, it is not robust, as is shown in Section 5. Therefore, we 
suggest choosing a constant g„ which attains robustness. This choice is also 
consistent with the automatic parameter selection in our software package. 
Consistency and asymptotic robustness are studied for both situations (for 
5n — > and for constant gn = g)- For all asymptotic results, we assume that 
the observations consist of the true signal and some additive noise, that is, 
Y-ij — mi^x-ij^ -\~ ^ij ' 

To prove consistency and robustness for the scale parameter converging to 
zero, we make the following assumptions with respect to the error distribu- 
tion concerning smoothness and the number of modes (.41). For the corner- 
preserving property, we need the additional assumption that the contrast at 
the discontinuities is larger than the noise spread (^2). 

(.41) The regression errors (background noise) £ij are independent and iden- 
tically distributed with a density function / supported on a bounded or 
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unbounded interval T C M such that the Lipschitz continuous deriva- 
tive /' has the property f'{y) / for all y G T \ {0} (i.e., / is strongly 
unimodal in 0). 

(^2) As Assumption (.41), but with the additional assumption that / is 
supported on a bounded interval (01,02) such that 02 — oi < (where 
d is the jump height; see {B2)). 

Further assumptions, collectively denoted by B, are equidistant spacing 
{Bl), conditions on the shape of the image {B2), usual assumptions on the 
kernel function (^3) and score function (BA) and standard asymptotic pa- 
rameter choice {B5): 

(Bl) The design points are Xij := (^-^, '^~n'^ )i hi = 1, . . . ,n. 

{B2) The regression function is m{x) := fi{x) +dlDix), where m{x) is de- 
fined on [0, 1]^, fi{x) is continuous on [0, 1]^, d> 0, and D is a nonempty 
closed set with a boundary dD which is the disjoint union of a finite 
number of edge curves. Observe that a relaxation of these assump- 
tions, allowing d = d{x) to be smooth in x and bounded below by 
some constant do > 0, is possible. 

(133) K{u) > on (—1,1)^ and equals elsewhere, K{u) is Lipschitz con- 
tinuous, K{0) > and / K{u) du = 1. 

(134) L{v) is a nonnegative function, has a Lipschitz continuous derivative 
and satisfies L(0) 7^ 0, / L{v) dv = 1, J L{v)\v\ dv < 00 and / L'{v)\v\ dv < 
00. 

{B5) As 71 00, we have gn -^0, 0, n~^h~'^ —>■ and n'^h'^g'"^ —>■ 0. 

For the robust version of the estimators (fixed g), we can relax the assump- 
tions {Al') resp. (.42') on /. The additional assumptions {B') differ from 
those of the nonrobust case only in (S4) and (^5) due to the fixed scale 
parameter. 

(.41') The regression errors Sij are independently and identically distributed 
with density function / which has bounded or unbounded support 
X C M, which is symmetric on [—g,g], strictly decreasing on (0, 00) Dl 
and strictly increasing on (— oo,0) CiI [i.e., / is (weakly) unimodal]. 

(.A2') As Assumption (.Al'), but with the additional assumption that the 
density function / is supported on the interval (—a, a) and that 2a + 
2g<d. 

(;B4') L has two Lipschitz continuous derivatives and is nonnegative, sym- 
metric, supported on ( — 1, 1) and strongly unimodal on its support: 
L' is positive on (—1,0). Finally, L" has a finite number of zeros 
in (-1,1). 

{B5') = is constant and as n ^ 00, hn ^0 and n^^h^"^ 0. 
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4. Corner preservation and consistency. 

4.1. Consistency and corner preservation for large samples. For a loca- 
tion estimator, consistency is a desirable property ensuring that the estimate 
becomes better if the sample size increases. This concept can be transferred 
to nonparametric regression by calling an estimator mn{x) consistent at x 
if for arbitrarily small £ > 0, 

lim P(\mn(x) — m(x)\ > e) = 0. 

n — ^oo 

However, consistency depends on the shape of the regression function (in 
our context, the image) around x. Usually, an image smoother is said to be 
consistent if it is consistent at points with smooth neighborhoods. In the 
context of the class of images which we consider in our theory according to 
Assumption {B2), we specify this feature as follows. 

Definition 5. An estimator m„ is called consistent in smooth regions 
if for all xo G (0, 1)^ \ OD and ah e > 0, 

lim P(\mn(xo) — m(xo)| > e) = 0. 

n — ^oo 

For local smoothers, consistency in smooth regions can be derived from 
consistency of the corresponding location estimator since observations in 
a shrinking neighborhood are then asymptotically independently and iden- 
tically distributed. Usually, the one-dimensional jump-preserving property, 
which is consistency at discontinuities (jumps) of a one-dimensional regres- 
sion function, can be transferred to consistency at smooth edge curves, that 
is, edge curves without corner points (singularities), because then, at least 
asymptotically, the majority of the observations lies on the "correct" side of 
the curve. This is the case for the estimators considered in [17], for example. 

Here, we show a substantially stronger consistency — the asymptotic preser- 
vation of corners. For technical reasons, we consider boundary points xq S 
dD which are rational, that is, xq £ dD n =: ODq. Then there exists a 
subsequence (n^)/^^ such that xq is a grid point for all n/. 

Definition 6. (a) An estimator is called edge-preserving for large 
samples if for all regular points xq £ ODq and for all e > 0, 

lim P(|m„,(xo) - m{xo)\ > e) = 0. 

(b) An estimator m„ is called a- corner-preserving for large samples if for 
all corner points xq G ODq with angle G (a, n) U (vr, 2it) and for all e > 0, 

lim P(|m„,(xo) - m{xo)\ > e) = 0. 

t— »oo 

(c) An estimator m„ is called corner-preserving for large samples if it is 
edge-preserving and a-corner-preserving for all a G (0,7r) U {tt,2tt). 



CORNER-PRESERVING IMAGE SMOOTHING 



15 



Therefore, only edge preservation was shown for some estimators (see, e.g., 
[19]). Theorem 1 shows consistency in smooth regions (a) and even, under 
stronger assumptions, the corner-preserving property of the M-smoother (b). 

Theorem 1. (a) Let Assumptions (Al) and (B), or (Al' ) and (B'), 
hold. Then the M- estimator of Chu et al. [3] is consistent in smooth regions. 

(b) Let Assumptions (A2) and (B), or (A2' ) and (B'), hold. Then the 
M- estimator of Chu et al. [3] is corner-preserving for large samples. 

From consistency of the M-smoother, uniform consistency and conver- 
gence of the integrated mean squared error can be derived as the following 
corollary states. 

Corollary 1. Let m„ he the M-estimator of Chu et al. [3], let Assump- 
tions (Al) and (B), or (Al' ) and (B'), hold and let C C (0, 1)^ be a compact 
set and U C (0, 1)^ an open set with dD nC CU. Then for all e > 0: 

(a) lim.„^oo -P(sup2.gc'\;7 l"^n(2;) - in{x)\ >e)=0. 

Under the additional assumptions (A2) and (A2'), respectively, we even 
have, for all e > 0: 

(b) lim.„^oo -P(/c I'mnix) - m(x)p dx > e) = 0; 

(c) lim„_»oo/ J(j\mn{x) — m{x)\'^ dx dP = 0. 

Consistency of the TM-smoother can be derived from consistency of the 
M-smoother. However, one must take care that the right mode of the score 
function Hn^x is not affected by the trimming procedure. The trimming 
proportion I should not be too large. However, in real applications this is 
not a strong restriction since outliers are typically sparse enough that a 
very small / can be chosen. Hence, even very small corners are preserved. To 
formulate the conditions for this, let F be the distribution function which 
satisfies F' = f . 

Theorem 2. (a) Let Assumptions (Al' ) and (B'J hold and suppose I < 
min{F(0),l — F{0)}. Then the TM-smoother rUn^i is consistent in smooth 
regions. 

(b) Let Assumptions (A2' ) and (B' ) hold and suppose I < ^ •min{F(0), 1 — 
-F(O)}. Then the TM-smoother irin^i is a- corner-preserving for large samples. 

The extension of Theorem 2 to uniform convergence needs consistency 
of the trimming procedure, which is implied in the consistency of the LTS- 
estimator. To our knowledge, the consistency of the LTS-estimator has only 
been shown for symmetric distributions; see [1, 26]. It seems likely that it 
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also holds for asymmetric distributions, but thus far, no one has succeeded 
in proving it. Hence, uniform convergence and convergence of the integrated 
mean squared error cannot be carried over from Corollary 1 to the TM- 
estimator yet, although we believe that it also holds. 

4.2. Corner preservation for finite samples. In the examples, we see that 
both the TM- and the M-smoother have good corner-preserving properties 
already in the finite case. This is based on the fact that the estimator is able 
to preserve minor features of the data sample in the window. 

If we consider a corner, as sketched in Figure 6, and assume that there is no 
noise, that we have only two "colors" (e.g., black and white), that the pixel 
in the center of the window is inside the corner and that Qn is sufficiently 
small, then such a corner is preserved by the M-smoother, regardless of how 
sharp the corner is. The TM-smoother requires at least [I ■ i^Jn,x\ + 1 pixels 
"inside" the corner to preserve it. In practice, the latter is not a strong 
assumption. In our examples, we used parameters allowing the preservation 
of all corners with more than three pixels (see Section 6 for more details). 

Estimators which follow the majority of the data, like many outlier robust 
estimators, do not have such a strong property — in a 5 x 5 pixel window, 
they need at least 13 pixels inside the corner, which therefore must have an 
angle of more than 3/47r. 

5. Robustness. 

5.1. Large sample robustness. Besides the question of asymptotic "cor- 
rectness" of the estimator under certain assumptions (which is answered 
by consistency), it is of interest to consider how the estimator is influenced 
by a violation of the assumptions, in particular by a contamination of the 
distribution of the error noise (this also includes outliers). 

For estimation of location, Hampel [6] introduced large sample robustness 
(see also [9]). An estimator is called robust for large samples if a small 
contamination of the distribution of the observations causes only a small 
bias of the estimator asymptotically. 

We transfer this concept from the location case to nonparametric regres- 
sion. For a precise definition, we need the Levy metric on the space V of 
probability measures on M, 

diiP, Q) := min{e : F{y - e) - e < G{y) <F{y + e) + e for all y € M}, 

where F and G are the distribution functions of the probability measures P 
and Q, respectively. The e-Levy neighborhood of P is defined as 

UL,e{P) = {Q^V:dL{P,Q)<e]. 
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Let m:JcM^^/CM, x ^ m{x), be a regression function and let Y : = 
(^i)*j=iv,"' where Yij are observations at Xij £ J. For the estimator rhn^x ■ 
j^nxn ^]^^ let [p)^nMY) be the distribution of m„,^(y) if P is the distri- 
bution of the i.i.d. residuals Yij — m{xij). 

Definition 7. The estimator rhn,x{Y) is called robust for large samples 
at P in x if for all e* > 0, there exist e > and N such that 

^^((pym„,.(y)^ (g)m„..(Y)^ < ^* fo^. g ^ ^7L,e(P) and n > iV. 

Note that we use the shorter form m„(x), instead of rhn^xiY), in the 
remainder of the paper. 

The basic idea of the M-smoother of Chu et al. [3] is that one searches for 
a local mode of a density estimate Hn,x- Chu et al. suggest g-n^ since then 
Hn,x converges to the density function of the distribution of the residuals. 
But a small change in the noise distribution may cause an additional mode 
in the density function, thereby causing a fairly large bias of the estimator. 

This observation led to the analysis of the asymptotics of the estimator 
with constant scale parameter g^. In this case, H^^x no longer converges to 
the density function, rather to some other function h which has the mode at 
the same place as the density, but which is less sensitive to contamination 
of the noise distribution. 

The following theorems summarize these results. Their proofs give inter- 
esting insights as to how we can analyze large sample robustness properties 
in nonparametric regression if standard methods cannot be applied. 

Theorem 3. (a) Let Assumptions (B) hold and let P be a distribution 
satisfying (Al). Further, let xq £ (0,1)^. Then the M-estimator m„(xo) of 
Chu et al. [3] is not robust at P in xq for large samples. 

(b) Let Assumptions (B' ) hold and let P be a distribution satisfying (Al' ). 
Let Xq E (0,1)^ \ dD. Then the M-estimator m„(xo) of Chu et al. [3] is 
pointwise robust for large samples at P in xq. 

If P satisfies (^2') and xo is a grid point for some n € N, then the esti- 
mator is even robust at corners xq E dD. 

Also, according to the following theorem, the TM-smoother is robust for 
large samples. 

Theorem 4. Let Assumptions (B' ) hold and let P be a distribution 
satisfying (Al'). Let xq G (0,1)^ \ dD. Then the TM-estimator mn,i{xo) is 
pointwise robust for large samples at P in xq. 
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The interesting result that large sample robustness of the M- and the 
TM-smoother crucially depends on the asymptotic choice of the scale pa- 
rameter gives important information about a sensible parameter choice. 
The median of the interquartile ranges of the observations in the windows 
turns out to be a good choice for an automatic parameter selection. More- 
over, it converges to some constant g and hence fits into the large sample 
robust modeling scheme. 

5.2. Finite sample robustness. However, a very special kind of distribu- 
tion contamination, that is, the presence of outliers, causes problems for the 
M-smoother — as can be seen in Figure 11, the M-estimator is not capable of 
removing outliers with a sensible parameter choice. Indeed, one can choose 
a much larger smoothing parameter gn, but then corners are no longer pre- 
served. Apparently, the asymptotic robustness property does not take effect 
in this case. 

Hence, we need another robustness concept characterizing estimators which 
are able to remove outliers, such as the TM-smoother. 

In 1971, Hampel [6] introduced a quantitative robustness measure called 
the breakdown point of an estimator. It is the minimal quota of observa- 
tions which must be arbitrarily biased so that the estimator tends to ±00. 
The extension of this concept to linear models can be found in Miiller [14]. 
The special case of a breakdown point in two-dimensional nonparametric 
regression is treated in [16]. 

Definition 8. Let x G 1 - /i„,)^ and let 

(y)jn.. ■={yij ■■ {hi) e Jn,x} 

be the set of observations in the window Uh„{x). Let 

)jn.x '■ ^ij / Uij foi' ™ost r of the Zij}. 
Then the maximum bias of an estimator rhn^x by replacing r observations 



of {y)j„.^ is defined as 

The breakdown point of rhn^x by replacing observations of {y)j^^ is defined 



and the breakdown point of rhn^x by replacing observations is defined as 



as 



e*("in,x,(y)j„,J :=min 



r 



: r G N with B{rhn,x, {y)j„.^ ,r) = oo 




£*{mn,x) ■= min{e*(m; 



.,x,(2/)j„,J:(y)j„,.GlR*"'"-}. 
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We can now define outlier robustness. 

Definition 9. An estimator m„(x) = mn,x is called outlier robust if its 
breakdown point by replacing observations is larger than 

Although redescending M-estimators are known to have high breakdown 
points (see, e.g., [13, 20]), this does not hold for the M-estimator of Chu et 
al. [3]. This is because a high breakdown point is only achieved if the M- 
estimator is defined as the global maximum of the score function. As soon as 
the M-estimator is defined as some local maximum such as the M-estimator 
of Chu et al. [3], the high breakdown point property is lost — it is obvious 
that the M-estimator of Chu et al. [3] has a breakdown point of l/#Jn,x- 
However, the TM-estimator is outlier robust, as we see from the following 
theorem. 

Theorem 5. Let Assumptions (B3) and (BA' ) hold and let x G 1 — 

hn?. 

(a) the M-estimator of Chu et al. [3] is not outlier robust; 

(b) if I £ [l/#J„^a;, 1/2), then the TM-estimator run^i is outlier robust 
and, in particular, 

e*{mn,i{x)) > I. 

This means that even if a fraction / of the observations in the window are 
outhers, the estimator does not "break down." We believe that a stronger 
result also holds: that the TM-smoother is even consistent in the presence 
of a fraction of outliers not larger than /. 

We can now clearly see the role of the parameter I: it removes outliers if 
they are at most 100 • 1% of the observations in the window and it preserves 
corners if they contain more than 100 • 1% observations; see also Section 4.2. 

6. Computational aspects. Both the M-smoother and the TM-smoother 
are contained in the R package epsi. Since the TM-smoother has a consider- 
ably better robustness property and a smoothing quality which is no worse, 
it is preferable to use the M-smoother. In any case, the M-smoother can be 
regarded as a special TM-smoother with / = 0. 

Like Chu et al. [3], we use the Gaussian density with mean and stan- 
dard deviation 1 as the kernel function L and the product density of the 
same distribution as the kernel function K. Also, L and K are set to zero 
outside [—1, 1] and [—1,1]^, respectively. The estimator can be used without 
choosing any parameters — reasonable parameters are set as default values. 
The window size (determined by hn) turns out to be optimal at 5 x 5 for 
images of up to 10^ pixels which gives, for example, hn = 0.02 for a 100 x 100 
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pixel image. For a 5 x 5 window, we use / = 0.15 as the trimming propor- 
tion, which means that three observations of the window are trimmed. This 
imphes that any corner consisting of more than three pixels is preserved (cf. 
Figm'e 6). The scale smoothing parameter Qn is calculated automatically as 
the median of the interquartile ranges. Although it is a good choice, one can 
sometimes improve the result slightly by adjusting the parameter manually. 

The algorithm can be sketched as follows: 

For each pixel, 

• trim the data set in the window; 

• calculate Hn^x based on the trimmed data set; 

• choose the starting point I^ojq from the complete data set and find the 
closest local maximum of Hn^x in the ascending direction. If the gradient 
is zero at the starting point and Hn,x is zero at this point (which is typ- 
ically the case when the starting point is an outlier) then search in both 
directions for the closest local maximum. 

For the maximization, we used a Newton-Raphson algorithm. The choice of 
step size is essential — we used the Amijo step size — otherwise, the conver- 
gence is poor and the algorithm is slow. The algorithm is implemented in 
C-I--I-. The source code can be found in the R-package epsi. 

7. Conclusion. The TM-smoother is a good choice for images with low 
level background noise, outliers, edges and corners. It is able to preserve 
corners and edges, and between the discontinuities, it has good smoothing 
properties, even if the image is not homogeneous as is desirable, for example, 
for the AWS-estimator of Polzehl and Spokoiny [18, 19]. However, its quality 
becomes worse if the variance of the background noise is too large. In that 
case, another method such as the AWS-estimator may be a better choice. 

This article also provides a theoretical framework for image reconstruc- 
tion which has not previously existed. By means of several properties, local 
smoothers can be compared with respect to smoothing/preservation quality 
and robustness. The proofs provide insight into how we can work with this 
theory. 

APPENDIX A: PROOFS 

For the proofs of the theorems, we need the following lemmas. Particularly 
essential for the proof of Theorem 1 is Lemma A.l. It claims that the sum of 
the kernel weights of the pixel positions in D converges for xq G dD. For this 
purpose, let Uh^(xo) := {x E [0, 1]^ : ||xo — 2;||oo ^ ^n.} be the window around 
xq with respect to hn and let Gn{xo) := DnUh^i^o)- If S (0, 1) \ dD, then 
Gnixo) = or Gn{xo) = Uh„{xo) for sufficiently large n. If xq £ dD, then 
/ Gn{xo) C Uh„{xo) for all n e N (see Figure 17). 
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Lemma A.l. Let xq G dD and let Assumptions (Bl), (132), (B2>) and 
(B5), or Assumptions (Bl), (B2), (B3) and (B5'), hold. Then there exists 
G{xo) C [—1, 1]^ such that 

^ ^hni^o - Xij) = f K{u)du + o{l) 

and l>/c.(^.,j)K(u)du>0. 

Proof. We prove Lemma A.l only for corner points xq. It is apparent 
that the proof for consistency at corner points also holds for regular points. 
For some fixed uq G N, let the set of discontinuities dD D Uh,^^{xQ) be de- 
scribed by the edge curve x{t) and let to G I he such that x{to) = xq. In the 
following proof, we always assume n > no and no is sufficiently large so that 
Xq is the only corner point in dD D Uh„^ ■ 

Let hi : = lim^ x'{t) and hr := limj^jp x'{t). Then 

T/(7, xo) = {z G z = xo - A • 6;, A G [0, oo)} 

and 

Tr(7, xo) = {z G : z = xo + A • 6r, A G [0, oo)}. 
Consider the rotation of parameters : — > defined by 

where 

hi + hr 



\\hi + h 



'r\\2 



Recall that ||6;||2 = H^rlb = 1 because x{t) is a natural parametrization. 
maps c (which is the normalized sum of the direction vectors hi , h^ of 
the directional tangents of xq) onto the x^-axis (g); see Figure 18. 
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Fig. li 



Rotation O. 



Observe that bj > and 6^ > 0. This can be seen as follows. By the 
Cauchy-Schwarz inequality, 

{br,br + bi) = \\br\\l + {br,bi) > || 6^ || | - | (6^ , 6,) | > 0. 

Hence, 

{br,c) = (br, „, \.. {br+bi))>0 
\ \\br + bl\\2 I 

and since is a rotation, 

b\ = (ir, ( J)) = (6(6.), e(c)) = {br,c) > 0. 

bj > is shown analogously. 

This, together with the Lipschitz continuity of x'{t), implies that there 
exists a neighborhood C/e(io) of to such that x^'{t) > on [4 (to) and hence 
is invertible. Then there exists, with Ui := {x^)~^{Us{to)), a function 
g : C/i ^ M such that g{x^{t)) = x^(t) for all t G Ue{to) and which is twice 
differ entiable on Ui\{xq}. 

The function g can be given explicitly as 

g{z) = xMi')-\z)) 
for z £lJi. Hence, for z S f/i \ {xq}, 

{x^Yiix^r^z)) 



and 



bl 



(xi)'((xi)-i(z)) 



hm g'{z) = ^=: A, Hm g'{z) = J- =: 



Since the curve is simple, there exists, for sufficiently small a neighbor- 
hood C/2 C M such that 

{5(t) : t G /} n (C7i X f/a) = {(5\5(i')) : G ^^i} 
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Fig. 19. (fi„,=oo 




and Xq lies in the interior of C/2. Without loss of generality, assume that 
e{D) n {Ui X ij2) lies beneath g, that is, e{D) n (f/i x C/a) = {x G f/i x f/s : 

< (7(3;^)}. Then there exists rii > ng such that Q{Uh^{xo)) C Ui x U2 for 
all n>ni and hence G„(xo) = -D H Uh^{xQ) = {tt e Uh„{xQ) : v? < g{u^)}. 

Moreover, there exist two Taylor expansions of g at xq, 



i2 



: g{x'^) = Xq + (X^ - Xq)/?; + (x^ - xj)r/;(x^ - Xq) 



and 



X^ = 5((x^) = xg + (x^ - xl)Pr + (x^ - Xo)??r(x^ - xj) 

where 



for x^ < xj 
for x^ > Xq, 



lim ?7i(a) = 
Define the transformation 

U t 



for i = l,r. 



-^(xo -n) 



if maps the window Uh„{xo), which contains the support of the kernel func- 
tion, onto the (mirror) unit square; see Figure 19. 
Now, define 

Bnixo) := {U G UhM ■■ < 3i0 + (^i^-^0)(Al(-oo,iJ](^^)+/3rl(ii,oo)(^^^))}- 

Bn{xQ) is the area which lies, with respect to the rotated axes, "beneath" 
the directional tangents of xq; see Figure 20. 
Further, we have 

^n,xo{Bn{xo)) = {u€ [-1, 1]^ : Xo - Ku G Bn{xo)} 

= {UG [-1, if :U^>U^- [Al[0,oo)(^i') +/?rl(-oc,0)(^^')]}- 
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Since ^n,xo{Bn{xo)) is independent of n, we can rename it as 

G{xo) := (pn,xo{Bn{xo)); 
see Figure 21. Now, consider, with the Taylor expansions mentioned above, 
Gn{xo) := ^n,xo{Gn{xo)) = {u £ [-1, if : e(xo - hnuf < g{<d{xo - hnu)'^)} 
= {u£ [-1, if :u^>u^- m+mi-hnu^))t[o,o.)iu^) 

-00,0) )]}• 

Define 

??max,n := { max |r/i(n)|, max \r]r{u)\]. 

Since 

Gn{xo) A G{xo) 

C{ue [-1, if : - n^(/3/l[o,oo)(?i^) + Prt{-oo,0){u^))\ < ?/max,n}, 

where the symmetric difference is defined as A /\ B := {A \ B) [J [B \ A) ^ 
the Lebesgue measure of the symmetric difference can be estimated by 
A(G„(xo) A G{xq)) < 6?7niax,n = o{l) as 71 ^ oo. It foUows immediately that 



K{u)du= / ^(1^)^^ + 0(1) 

G„(a;o) Jg{xo) 

since K is bounded. Hence, it suffices to show that 



lim f ^ V Kh„{xo - Xij) - [ K{u)du)=0. 

For the proof of this property and other technical details of the proof, see 
[7]. □ 



CORNER-PRESERVING IMAGE SMOOTHING 



25 




We define the set of indices in Jn,xo = {{hj) £ {li • • • j''^}^ '■ W^o — ^^ijHoo < 
hn} corresponding to D by 

/^"(xo) := G {1, . . . : Xij G G„(xo)}. 

Observe that for ah (i, j) G Jn,xo \ ^n^i^o), we have m{xij) = fJ.{xij) and for 
all (z, j) G I^"{xo), we have m{xij) = p^{xij) + d. 
We then have the following corollary. 

Corollary A.l. 
1 



^ Kh„{xo-Xij) = l- [ K{u)du + oil). 

r Jg{xo) 



Note that the equalities in Lemma A.l and Corollary A.l also hold for xq G 

/g{xo) - 

then K{u) du = 0. 



(0, 1)2 \ aL>. If xo G \ 91), then K(n) du = l and if xq G (0, 1)^ \ D, 



Define 

K{u) du 



lG{xo) 

and for the case that the scale parameter is converging to zero, 
'uxofiy) + (1 - ^xo)f{y + d), for 1/^.0 G (0, 1), 
f{y), for i/j.,, = 1 or zy^(, = 0. 

For the case that the scale parameter is fixed by = 1, define 
'uxoHy) + (1 - ^xo)h{y + d), for v^o G (0, 1), 
h{y), for i/^o = 1 or i/^.o = 0, 

where h{y) := / L{y — u)P{du). 



hd,u^g (y) := 
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Lemma A. 2. Let xq G (0, 1)^. 
(a) Let Assumptions (Al) and (B) hold. Then 

sM^K x^iy) - f'^ (y - m{xo))\ = o{l). 



(b) Let Assumptions (Al' ) and (B' ) hold. Then 

sup\EHl, .^^{y) - h'a {y - m{xo))\ = o{l). 



Proof, (a) We provide the proof only for the case xq G dD. The proof 
for xo G (0, if \ dD is the same, but even more simple. From Lemma A.l, 
Corollary A.l and the Lipschitz continuity of /', we obtain 



sup 



1 " d 

— ^hnixo - Xij)E—Lg^{y - Yij) - fd^^^^{y - m(xo)) 



sup 



K{u) du f {y - m{xo)) 



+ 



Gixo) 
1 



E 



X / Ll ]fiu)du 

dy Qn \ Qn 



(A.l) 



G{xo) 



K{u)du]f{y-^i{xo)) 



1 



< sup <^ — ^ Kh^XQ- Xij ) 



{ij)G/;^"(xo) 

X / L{v)\f'{y-m{xij)-vgn)-f'{y-m{xQ))\dv 



+ 



1 



E 



Kh^XQ-Xij) 



(ij)GJ„,^oUn"(2:o) 

X / L{v)\f'{y-^l{xij)-vgn)-f'{y-^i{xQ))\dv\+o{l) 



:o(l). 
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The proof of assertion (b) is based on an equality which is analogous to 
equality (A.l) in the proof of (a). Note that we now have E-^L{y — Yij) = 

h'{y - m{xij)) for G /^"(xq) and E-^L{y - Yij) = h'{y - fJ,{xij)) for 

Jn,xo\ In" i^o)- n 

Lemma A. 3. Let xq £ (0,1)^ and let Assumptions (Al) and (B), or 
Assumptions (Al' ) and (B' ), hold. Then 

hin P(sup \K^,,{y) - EH'^^,^{y)\ <e)=l for all e > 0. 

Proof. The proof is analogous to that of the one-dimensional case given 
by Hillebrand and Miiller [8] in Lemma 4 using the Fourier transform of L' . 
The only difference is that ^Pniu) must be defined as 
ifniu) = n-^h-^Ek,j=iK{^)e-'^''^^ instead of = n-'h-'Ek=i 

.j^(^x^Xk^^-iuYk^ where i = Then the condition n~^h~^g~'^ ^ of As- 

sumption {B5) is used instead of n~^h~^g~'^ — > for gn converging to zero. If 
gn is fixed, then it is clear that we only need Assumption {Bb'). Then the re- 
sult can also be shown without the Fourier transform: since L' is bounded by 
Assumption (i34'), we obtain pointwise convergence by using Chebyshev's 
inequality and Corollary A.l. The Lipschitz continuity of L' and h' then 
imply the uniform convergence. □ 

Proof of Theorem 1. The proof of Theorem 1(a) is analogous to that 
of the one-dimensional case given by Hillebrand and Miiller [8], replacing A 
by Vxfi - In particular, it is based on Lemma A. 2(a) and Lemma A. 3. For fixed 
gn, the proof is the same as for (7„ — > if / is replaced by /i. Lemma A. 2(b) 
is used instead of Lemma A. 2(a) and fd,v^^^ is replaced by h^.y^^^- We see 
that hd^jj^^ has the same properties as fd^vx,^ because of Assumptions (^2') 
and {BA'). For this purpose, note, in particular, that the support of h is 
(—a — g,a + g) and that h is strongly unimodal. □ 

From the proofs of Lemma A. 2 and Lemma A. 3, we obtain Lemma A. 4. 

Lemma A. 4. Let C C (0, 1)^ he a compact set and U C (0, 1)^ an open 
set with dDnCcU. 

(a) Under Assumptions (Al) and (B), we have 

lim P( sup sup ^(y) — /(y — m(a;))| < e ) = 1 foralle>0. 

\eC\UyeR ' ' 

(b) Under Assumptions (Al') and (B' ), we have 

lim P( sup sup |//^ ^.(y) — /i(y — ?Ti(a;))| < e ) = 1 foralle>0. 
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Proof of Corollary 1. Assertion (a) follows from the uniform con- 
vergence of Hn^x to f{y — m{x)) and h{y — m{x)) , respectively, both of which 
are strongly unimodal, where h(y) := / L(y — u)P{du) (see Lemma A. 4). 
Assertions (b) and (c) hold since the Lebesgue measure of dD is zero, so 
an open set U D dD D C can be found with arbitrarily small Lebesgue 
measure. Then under Assumptions (.42) and (^2'), respectively, because 
of the bounded support of the error, |m„(x) — m{x)\ is also bounded, so 
/[/l'T^n(a^) — 'm{x)\'^ dx can be chosen arbitrarily small by means of an ap- 
propriate U . This also holds if mn{x) does not converge to m{x) on U. 
□ 

Lemma A. 5. Let be y^^n = min{lij; (i, j) G Rn,i{x)] and yo,n = m.ayi{Yij; 
G Rn^iix)}. Then 

(a) H'j^.x{y) = Hn,x{y) for y G {yu,n + 9, yo,n - g) if yu,n + g< yo,n - g; 

(b) H:,.,{y)>K^M forye[y u,ni yu,n 

+ g] and H'^.x{y) < H'^^^{y) for 

y e \yo,n- g,yo,n]i 

(c) K,x{y) >Oforye{y u,n g-i yu,n 

) and H'^^^{y) < for y e {y 

yo,n + g)[ 

(d) Hn,x{y) = (^^d H'^^xiv) = for y eR \ {yu,n - g,yo,n + g)- 

Proof of Theorem 2. The proof of Theorem 1 is based on the fact 
(cf. [8]) that for all ei,e' > 0, there exist Ci, C2 and no G N such that 

P{Ci < liojo - rn{xo) < C2) > I - ei for n > no, 

H'n,x{y) > on [^{xq) +Ci,m{xQ) -e'], 

H'n,xiy)<^ on [m(xo) -|-e',m(xo) -I-C2]. 

Let qi^n be the ^-quantile and qi-i^n the (1 — /)-quantile of {Yij] G Jn,xo}- 
Since quantiles are asymptotically linear, we have (see [17]) 

lim qi n = qi and lim „ = qi_i, 

n— >oo ' ' 

where qi and qi_i are the quantiles of the distribution given by fd,ua:Q (y ~ 
m{xQ)). Since 

n^oo #J„^^. A([-l,l]^) 8 

there exists for every xq G (0, 1)^ and for e' sufficiently small, some ni > uq 
such that for all n > ni, 

y«,n < qi,n < m{xo) -e' < ni(xo) + e' < gi_;,„ < yo,n- 
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Fig. 22. q,{y). 



By Lemma A. 5, we have for all n>ni, 

H'n^xiv) > on [max{ m(a;o) + Ci}, m(xo) - e'], 

H'n^xiy) < on [m(xo) + e' , min{?/o,„, m(xo) + C2}]. 

Hence, because ligjo G [?n.(a;o) +Ci,m(xo) + C2] with probability greater 
than 1 — ei, the closest local maximum of Hn^x to 5^o,jo with Hn^x{y) 7^ 
lies in [m(xo) — e', ?7T-(xo) + e']. □ 

Proof of Theorem 3(a). It suffices to show the claim for x G (0, 1)^ \ 
dD. For arbitrarily small e > 0, we will create a distribution which lies in 
the e-Levy-neighborhood of P and has a multimodal density. 

Let be such that f{y) dy > 0. Further, let 5 := -f'{c) > 0. 

Consider 




{y):={ V 26 



2^ \ 2 \ 2 



62) , if ye 

otherwise, 



1 3 

26'""+ 26 



where o := y |f and 6 := y Hf ; see Figure 22. 

It is easily verified that qe is continuously differentiable and Lipschitz 
continuous, satisfying ^^(c) = 7 and / qs{u)du = 1. Hence, 

fe{y) ■.= {l-e)f{y) + eq,{y) 

is a density function with f^{c) = e-6>0 and the corresponding distribution 
Pe lies in the e-Levy-neighborhood of P since 

|F(y)-F,(y)|=e-|F(y)-G,(y)|<e, 

where Gs (y) is the distribution function of the distribution with density 
qe{y) and Ff,{y) is the distribution function of the distribution P^. 




Note that /^(c + 3/(26)) < since ^^(0 + 3/(26)) = 0. Since fe is differen- 
tiable, it has a local maximum between c and c + 3/(26); see Figure 23. For 
sufficiently small e > 0, c + 3/(26) is close to c and hence 

/•oo 

/ f{u)du>0. 

Since Lemma A. 2(a) and Lemma A. 3 also hold for fe{y), Hn.x{y) has a local 
maximum in [m{x) + c,m{x) + 3/(26)] with probability tending to one as 
n — > 00. If, additionally, the starting point is greater than m{x) + c + 3/(26), 
then mn{x) will be greater than m(x) + c. 

Let (Qg)™'"'^^) denote the distribution of the estimator m„(j;) if is the 
distribution of the residuals. Then if ei > is the probability (vanishing as 
n — > 00) that Hn,x{y) has no local maximum in [m{x) + c, m(x) + c + 3/(26)], 
we have 

(g,)'-"(^)([m(x)+c,oo])> / fe{u)du-E^. 

Since by Theorem 1 we also have 

(P)'""(^)([m(x)+c/2,oo]) <e2 
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Fig. 24. Distribution functions of (P)™"!^) anrf (Qe:)'""^^'. 

for some £2 > which vanishes as n becomes large, we have, as shown in 
Figm-e 24, 

> min< / f{u)du — e — ei — £2, - \. □ 



Proof of Theorem 3(b). Let e UL,e{P) and let be its dis- 
tribution function. Further, let /max := niaXygR/(y) and h'Q^{y) = / -L'(2/ — 
u)dGe{u). Because 

F{y)-fm.^-e-e<F{y-e)-e<Ge{y)<F{y + £)+e<F{y) + U^^-e + e, 
we have 

(A.2) |G',(2/)-F(y)| </max-e + e 

for all y G M. Assumption (B4') then implies 

ra 



L"{u){Ge{y-u)-F{y-u))du 



< / \L"{u)\\Ge{y-u)-F{y-u)\du 



< r |L"(n)|(/max-e + e)dn 

J ~a 



-9 

■ C-e 



where C := ^ |L"(n)| dn(/max + 1). 

Let ei > be arbitrarily small. Let 6 := min{|/i'(y)| : y G [—a, — ei]U [ei, a]}. 
Obviously, 5 > 0. 
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Let e < ^ • |. Then 

sup\h'{y)-h'aM\<-. 

Since Lemma A. 2(b) and Lemma A. 3 also hold for Gg, we obtain that 
for arbitrarily small £2 > 0, there exists no G N such that with probability 
1 - £2, 

sup|-ff4,^.(y) - l^G^y - 'fn{x))\ < - 

3/GM ^ 

for all n > TiQ. Hence, with probability 1 — £2, 

^^v\H'{y) - h'{y - m{x))\ < 5 

for all 71 > no- This implies: 

(1) H'^^{y) > on [m(x) — a, m{x) — £1] and 
H'n.xiv) < on [m{x) +ei,m(x) +a]; 

(2) at least one zero of H'^ x{y)i which is a local minimmii of —Hn^xiu), lies 
in the ei-neighborhood of m{x). 

We conclude that if the starting point lies in {m{xiQ) — a,m{xig) + a), the 
closest zero of H'j^ ^[y) in the direction searched lies, for n>nQ, in [771(3;) — 
£i,ni{x) + £1], with probability larger than 1 — £2- From (A. 2), we have 
that the probability of the starting point lying in {m(xiQ) — a, m[xi^^) + a) is 
greater than 1 — 2(/max + Hence, 

(Q,)-"(-)([7n(x) - £iMx) + ei]) > 1 - £2 - 2(/^ax + 1)£; 
see Figure 25. Since by Theorem 1(a), we have 

(P)™"(")([7n(x) - £iMx) + ei]) > 1 - £2, 
it follows that for n>no, 

dL{{Pr-^-\{Q,r-^-^) < max{2£i,£2 + 2(/^ax + l)e}. □ 

Proof of Theorem 4. The assertion can be seen to follow from The- 
orem 3(b) with arguments similar to those used to show that Theorem 2 
follows from Theorem 1. □ 



Proof of Theorem 5(b). Let {y)j„_^ and set 

T/min := min{7/ij : G J„,^.} 
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Fig. 25. Distribution functions of {P)"'"^^'> and (Qe)'""^^). 



and 



ymax := max{yij : £ Jn,x}- 



Let {z)i £ yn., r^y. Since at least i^Jn,x — f elements of {z)j^ ,^ are contained 
in [ymin,ymax], we have 



(A.3) min ^ S{fc)(y) < - ?')(ymax - ymin)^- 

k=l 

Let y e argminyeMX;fir'"~'''S(fc)(y)- Then 

y £ ymin " max - ymin), ymax + \l #Jn,x " max ^min j 



since 



otherwise there is at least one zi^j^ 

Sioioiv) = iViojo - yf > (.#Jn,x - ?-)(ymax " Vminf , 



Vmin - \l #Jn,x - r{y max Urain ) i ^max 

jQ with ZjQjQ = yioio ^ [ymirnymax] and 



which contradicts (A.3). If some 

2 ^ ^Jn,x '"(?/max ymin) ) Umax ~l~ 2 4^Jn,x ^{Umax ymin J 

- y)^ > {i^Jn,x - r){ymi,x - Vnnnf and hence 



ymin ■ 



then Siiji (y) = {zi^j^ - y) 
Rn,l{x). 

This means that all Zij with G Rn,i{x) lie in 

I ^Jn,x f"{ymax ymin)) ymax ~l~ 2^/ ^Jn,x '"(ymax ymin 

From the definition of mn^i{x), it immediately follow 
lies in the support of Hn^x{z), which is not greater 



= rhn,x,l{^) 



ymin 2^/ ^Jn,x '"(ymax ymin) 5) Umax ~l~ '^'\J ^Jn,x '"(ymax ymin)~l~y 

This proves the claim. □ 
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